Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scheduler): mms send scaling request when model shceduling fails #6235

Draft
wants to merge 5 commits into
base: v2
Choose a base branch
from

Conversation

driev
Copy link

@driev driev commented Jan 28, 2025

What this PR does / why we need it:

When the number of server replicas is insufficient for the number of desired/exptected model replicas, i.e. when a model is partially available, the scheduler needs to send a scaling request to the operator to increase the number of server replicas.

Which issue(s) this PR fixes:

Fixes # INFRA-1048

Special notes for your reviewer:

  • kind test

@driev driev requested review from sakoush and lc525 as code owners January 28, 2025 16:45
@driev driev added the v2 label Jan 28, 2025
@driev driev marked this pull request as draft January 28, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant